Semi-Lazy Learning: Combining Clustering and Classifiers to Build More Accurate Models

نویسنده

  • Ian Davidson
چکیده

Eager learners such as neural networks, decision trees, and naïve Bayes classifiers construct a single model from the training data before observing any test set instances. In contrast, lazy learners such as Knearest neighbor consider a test set instance before they generalize beyond the training data. This allows making predictions from only a specific selection of instances most similar to the test set instance which has been shown to outperform eager learners on a number of problems. However, lazy learners require the storing and querying of the entire training data set for each instance which is unsuitable for the large amounts of data typically found in many applications. We introduce and illustrate the benefits of an example of semi-lazy learning that combines clustering and classification models and has the benefits of eager and lazy learners. We propose dividing the instances using cluster-based segmentation and then using an eager learner to build a classification model for each cluster. This has the effect of dividing the instance space into a number of distinct regions and building a local model for each. Our experiments on UCI data sets show clustering the data into segments then building classification models using a variety of eager learners for each segment often results in a greater overall cross-validated accuracy than building a single model or using a pure lazy approach such as K-Nearest-Neighbor. We can also consider our approach to semi-lazy learning as an example of the divide and conquer (DAC) strategy used in many scientific fields to divide a complex problem into a set of simpler problems. Finally, we find that the misclassified instances are more likely to be outliers with respect to the clustering segmentation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining Classifier Guided by Semi-Supervision

The article suggests an algorithm for regular classifier ensemble methodology. The proposed methodology is based on possibilistic aggregation to classify samples. The argued method optimizes an objective function that combines environment recognition, multi-criteria aggregation term and a learning term. The optimization aims at learning backgrounds as solid clusters in subspaces of the high...

متن کامل

Combining Classifier Guided by Semi-Supervision

The article suggests an algorithm for regular classifier ensemble methodology. The proposed methodology is based on possibilistic aggregation to classify samples. The argued method optimizes an objective function that combines environment recognition, multi-criteria aggregation term and a learning term. The optimization aims at learning backgrounds as solid clusters in subspaces of the high...

متن کامل

Combining Classifiers in Multimodal Affect Detection

Affect detection where users’ mental states are automatically recognized from facial expressions, speech, physiology and other modalities, requires accurate machine learning and classification techniques. This paper investigates how combined classifiers, and their base classifiers, can be used in affect detection using features from facial video and multichannel physiology. The base classifiers...

متن کامل

A Lazy Ensemble Learning Method to Classification

Depending on how a learner reacts to the test instances, supervised learning divided into eager learning and lazy learning. Lazy learners endeavor to find local optimal solutions for each particular test instance. Many approaches for constructing lazy learning have been developed, one of the successful one is to incorporate lazy learning with ensemble classification. Almost all lazy learning sc...

متن کامل

استفاده از یادگیری همبستگی منفی در بهبود کارایی ترکیب شبکه های عصبی

This paper investigates the effect of diversity caused by Negative Correlation Learning(NCL) in the combination of neural classifiers and presents an efficient way to improve combining performance. Decision Templates and Averaging, as two non-trainable combining methods and Stacked Generalization as a trainable combiner are investigated in our experiments . Utilizing NCL for diversifying the ba...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003